12 research outputs found

    SPsimSeq : semi-parametric simulation of bulk and single-cell RNA-sequencing data

    Get PDF
    SPsimSeq is a semi-parametric simulation method to generate bulk and single-cell RNA-sequencing data. It is designed to simulate gene expression data with maximal retention of the characteristics of real data. It is reasonably flexible to accommodate a wide range of experimental scenarios, including different sample sizes, biological signals (differential expression) and confounding batch effects

    On the utility of RNA sample pooling to optimize cost and statistical power in RNA sequencing experiments

    Get PDF
    Background: In gene expression studies, RNA sample pooling is sometimes considered because of budget constraints or lack of sufficient input material. Using microarray technology, RNA sample pooling strategies have been reported to optimize both the cost of data generation as well as the statistical power for differential gene expression (DGE) analysis. For RNA sequencing, with its different quantitative output in terms of counts and tunable dynamic range, the adequacy and empirical validation of RNA sample pooling strategies have not yet been evaluated. In this study, we comprehensively assessed the utility of pooling strategies in RNA-seq experiments using empirical and simulated RNA-seq datasets. Result: The data generating model in pooled experiments is defined mathematically to evaluate the mean and variability of gene expression estimates. The model is further used to examine the trade-off between the statistical power of testing for DGE and the data generating costs. Empirical assessment of pooling strategies is done through analysis of RNA-seq datasets under various pooling and non-pooling experimental settings. Simulation study is also used to rank experimental scenarios with respect to the rate of false and true discoveries in DGE analysis. The results demonstrate that pooling strategies in RNA-seq studies can be both cost-effective and powerful when the number of pools, pool size and sequencing depth are optimally defined. Conclusion: For high within-group gene expression variability, small RNA sample pools are effective to reduce the variability and compensate for the loss of the number of replicates. Unlike the typical cost-saving strategies, such as reducing sequencing depth or number of RNA samples (replicates), an adequate pooling strategy is effective in maintaining the power of testing DGE for genes with low to medium abundance levels, along with a substantial reduction of the total cost of the experiment. In general, pooling RNA samples or pooling RNA samples in conjunction with moderate reduction of the sequencing depth can be good options to optimize the cost and maintain the power

    Differential gene expression analysis tools exhibit substandard performance for long non-coding RNA-sequencing data

    Get PDF
    Background: Long non-coding RNAs (lncRNAs) are typically expressed at low levels and are inherently highly variable. This is a fundamental challenge for differential expression (DE) analysis. In this study, the performance of 25 pipelines for testing DE in RNA-seq data is comprehensively evaluated, with a particular focus on lncRNAs and low-abundance mRNAs. Fifteen performance metrics are used to evaluate DE tools and normalization methods using simulations and analyses of six diverse RNA-seq datasets. Results: Gene expression data are simulated using non-parametric procedures in such a way that realistic levels of expression and variability are preserved in the simulated data. Throughout the assessment, results for mRNA and lncRNA were tracked separately. All the pipelines exhibit inferior performance for lncRNAs compared to mRNAs across all simulated scenarios and benchmark RNA-seq datasets. The substandard performance of DE tools for lncRNAs applies also to low-abundance mRNAs. No single tool uniformly outperformed the others. Variability, number of samples, and fraction of DE genes markedly influenced DE tool performance. Conclusions: Overall, linear modeling with empirical Bayes moderation (limma) and a non-parametric approach (SAMSeq) showed good control of the false discovery rate and reasonable sensitivity. Of note, for achieving a sensitivity of at least 50%, more than 80 samples are required when studying expression levels in realistic settings such as in clinical cancer research. About half of the methods showed a substantial excess of false discoveries, making these methods unreliable for DE analysis and jeopardizing reproducible science. The detailed results of our study can be consulted through a user-friendly web application, giving guidance on selection of the optimal DE tool (http://statapps.ugent.be/tools/AppDGE/)

    Statistical methods for testing differential gene expression in bulk and single-cell RNA sequencing data

    No full text
    The transcriptome is the complete set of RNA molecules in biological samples, and they are primarily invariant in different tissues and cells of the same individual. Studying the transcriptome enables understanding of the genome function in response to biological or experimental factors, such as disease and development. RNA sequencing (RNA-seq) uses massively parallel sequencing technologies to profile the transcriptome. Among others, the objective of many RNA-seq studies is to identify features of the transcriptome (such as genes) that are deferentially expressed (DE) between or among groups of individuals or tissues that are different with respect to some condition (e.g. disease status and treatment). However, because of technical artefacts in RNA-seq technologies and the dynamics of the biological system, the profiling (quantifying) of the transcriptome is typically subjected to an inherent variation even within the same condition. Consequently, statistical methods are principled approaches that help biologists understand to what extent (on average) genes are DE using RNA-seq datasets. In this doctoral thesis, challenges associated with DE analysis of RNA-seq data are explored, and various solutions have been proposed. In particular, existing statistical methods for DE analysis are comprehensively evaluated, a novel simulation tool for RNA-seq data is proposed to facilitate realistic evaluation of statistical methods for DE analysis, a new method is proposed to improve the existing distribution-free methods for DE analysis, and cost-effective experimental designs for RNA-seq studies are explored. The findings discussed in this dissertation are relevant to all scientists and clinicians involved in RNA-seq studies, in particular for testing DE, benchmarking statistical and bioinformatics tools, designing cost-effective RNA-seq experiments

    Evaluation of antagonistic activities of trichoderma isolates against Fusarium wilt (Fusarium oxysporum) of tomato (Lycopersicon esculentum Mill.) isolates

    No full text
    The study was initiated with the objective of controlling tomato wilt disease (Fusarium oxysporum) using Trichoderma isolates as biocontrol agents. F. oxysporum was isolated from diseased tomato plants grown in five selected kebeles of Dugda Bora and Adami Tulu Jido-Kombolcha woredas of the Central Rift Valley (CRV) region of Ethiopia. The pathogenicity of F. oxysporum was determined on three different tomato varieties namely Cochoro, Miya and Fetane that were grown in 20 cm plastic pots containing 3 kg of autoclaved soil under the greenhouse. The antagonistic effect of Trichoderma isolates against the test pathogen was tested both in vitro and in vivo conditions. From the three tomato varieties, Miya was more susceptible to F. oxysporum infection than both Cochoro and Fetane varieties. The antagonistic effects of Trichoderma isolates on the mycelial growth of the test pathogens, AUT9, AUT 8 and AUT10, showed 66%, 61% and 58% inhibition, respectively, on the mycelial growth of F. oxysporum isolate. All Trichoderma isolates achieved maximum mycelial growth at 25°C and minimum mycelial growth at 15°C. From the current comparative in vivo and in vitro (green house) studies it is evident that the most effective antagonist of the Trichoderma isolates to F. oxysporum was AUT9 and the most resistant tomato variety was Fetane.Keywords: Antagonism, Antibiosis, Pathogen, Pesticide, Resistance, Sympto

    Automated quality control tool for high-content imaging data by building 2D prediction intervals on reference biosignatures

    No full text
    Recent advances in automated microscopy and image analysis enables quantitative profiling of cellular phenotypes (Cell Painting). It paves the way for studying the broad effects of chemical perturbations on biological systems at large scale during lead optimization. Comparison of perturbation biosignatures with biosignatures of annotated compounds can inform on both on- and off-target effects. When building databases with phenotypic profiles of thousands of compounds, it is vital to control the quality of Cell Painting assays over time. A tool for this to our knowledge does not yet exist within the imaging community. In this paper, we introduce an automated tool to assess the quality of Cell Painting assays by quantifying the reproducibility of biosignatures of annotated reference compounds. The tool learns the biosignature of those treatments from a historical dataset, and subsequently, it builds a two-dimensional probabilistic quality control (QC) limit. The limit will then be used to detect aberrations in new Cell Painting experiments. The tool is illustrated using simulated data and further demonstrated on Cell Painting data of the A549 cell line. In general, the tool provides a sensitive, detailed and easy-to-interpret mechanism to validate the quality of Cell Painting assays
    corecore